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Abstract 

In this paper we study a general class of online maximization problems which are as follows. 
We are given a time constraint T. We have to choose a sequence of actions from a set of possible 
actions and also the length of time to run each action subject to the total time being no more 
than T. Each action has a marginal profit. We show that if the problem has the following two 
properties, then there is a greedy algorithm that can yield 0(1 — =■) of the optimal. 

• Performing an action earlier does not decrease the marginal profit of the action. 

• Running a sequence of actions A followed by a sequence of actions B yields at least as 
much profit as the maximum profit of 4 or 5. 

The greedy algorithm also has the advantage that it can still be applied in many settings where 
complete knowledge of the problem is not available or in online settings where the input is 
revealed gradually. We also give examples of non-trivial problems, for some of which we are not 
aware of any better algorithm. 
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1 Introduction 



Submodularity over set functions is an important concept in combinatorial optimization. Many 
classical discrete problems with greedy algorithm belong to this class. In economics, as well, the 
submodularity property has received considerable attention since it captures the notion of decreas- 
ing marginal utilities. Therefore, discovering the characteristics of submodular functions is of great 
interest. Most of the instances in this class are NP-complete though. As a result, most of the work 
in the literature has been focused on designing efficient approximation algorithms for these prob- 
lems and the greedy approach has always been a natural choice leading to simple implementations. 
Many have studied the behavior of greedy algorithms on maximizing non-decreasing, submodular 
functions and it has been shown [13, 12, 14] that greedy algorithms achieve remarkably good ap- 
proximations for maximizing non-decreasing submodular functions subject to some constraints. We 
should note that however, in all the previous work, the submodularity property has been defined 
on set functions. On the other hand there are some maximization problems that are defined on a 
sequence rather than a set in which the order of elements matters. There are also problems that are 
defined over continuous sequences. In this paper, we define the notion of Sequence Submodularity 
for functions defined over sequences (both continuous and discrete sequences) . We will show that if 
the objective sequence function is Sequence Submodular, Non Decreasing and in case of continuous 
sequences Differentiable then a greedy approach achieves (1 — -) of the optimal solution for maxi- 
mizing the objective function under knapsack constraints (i.e., when there is a limit on the length 
of the sequence). At the end, we present an application of this framework to internet advertising 
and more specifically to the online ad allocation and query rewriting problem. 

2 Related Work 

Submodularity and greedy algorithms Submodularity has been studied in more depth in 
recent years due to its applications to combinatorial auctions (e.g., the submodular welfare problem 
[9, 7]), generalized assignment problems [3], etc. 

The greedy approach is a natural tool to solve maximization problems with a submodular objec- 
tive function. Nemhauser and Wolsey [12] showed that greedy approach gives an ^^--approximation 
for maximizing a non-decreasing submodular function over a uniform matroid. Nemhauser, Wolsey, 
and Fisher [13] considered this problem over the independence system. They showed that if the 
independence system is the intersection of M matroids, the greedy algorithm gives an M + 1 ap- 
proximation. Recently, Goundan and Schulz [5] generalized both these results and showed that 
if an a-approximate incremental oracle is available, then the greedy solution is a e 1 / a /(e 1 / a — 1) 
approximation for maximizing a non-decreasing submodular functions over a uniform matroid and 
an aM + 1 approximation for the intersection of M matroids. Feige et al in [2], gave a general 
framework for solving the non-monotone submodular problems. 

Online allocation problem There is a considerable amount of literature on adword auctions 
considering different variations for improving web and paid search results in the economics and 
computer science community. The adword auction problem that is considered in this paper, is the 
online allocation problem. In the online allocation problem, the goal is to decide which ads to show 
for each incoming query so that the the obtained profit from the advertisers is maximized. Several 
[11, 8] papers have studied this problem. Mehta et al [11] presented a deterministic algorithm with 
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the competitive ratio of (1 ) in the worst case model. It can be shown that the competitive 

ratio for the greedy algorithm is \ in the worst case analysis. Later, Goel et al [4] showed that 
the competitive ratio of the greedy approach in the random permutation model as well as the i.i.d 
model is (1 — ~) and in fact, the analysis is tight. Their proof is partly based on the techniques 
used in [6] for the online bipartite matching problem. The offline variant of ad allocation has been 
studied in [1, 3]). It has also been shown that the problem is NP-complete with the best known 
approximation factor of (1 — -) and the results still hold even if bids are not very small compared 
to budgets. (If the bids are very small compared to budgets the solution obtained based on LP 
rounding has an approximation factor very close to 1.) 

3 Our Contribution 

In the previous works, the submodularity property is defined only on functions over sets. Never- 
theless, there are problems in which the goal is to choose a sequence of actions to maximize some 
utility function defined over that sequence. In some of these problems, the order of actions matters. 
Also, sometimes, the actions are continuous and each action is used for some specified duration. 
Such problems cannot be modeled using a submodular set function. Throughout the rest of this 
paper, we define and characterize the conditions that are necessary for sequence functions so that 
we can obtain the same conclusions about the behavior of a greedy approach over this class of func- 
tions. A series of operations with the property that each operation is performed for some specified 
duration can be seen as a continuous sequence. What we will show is that if a sequence function 
has the three properties of being "non-decreasing" , "Sequence Submodular" and "differentiable" , 
a greedy approach always achieves a solution that is at least (1 — i) of the optimal solution for the 
maximization problem subject to a constraint on the maximum length of the solution sequence. As 
an example, we show that the online ad allocation problem with a fixed distribution of keywords 
over time can be modeled as maximizing a continuous non-decreasing submodular sequence func- 
tion for which we can guarantee that the greedy approach achieves at least (1 — -) of the optimal 
and also for the problem of query rewriting as explained in section 9 we achieve a 1 ~r ra 0.47 

e e 

approximation improving upon the j approximation of [10]. 

4 Model 

Here we define the notation that we will use throughout the rest of this paper: 

Discrete Sequence: Let S be a finite set. Any A = (si, • • • , s^) where k € IN U {0} and Sj € S, 
is called a discrete sequence of elements of S (k = is the empty sequence). We also denote 
the set of all finite discrete sequences of S by M D (S) which is formally defined as: 

M D (S) = {A = (s lr -- , Sfc )|fceINu{0}, S4 eS } (4.1) 

Notice that a discrete sequence actually defines a discrete function from {1, • • • , k} to S and 
any such discrete function can be represented using a discrete sequence. We denote the value 
of the function defined by discrete sequence A at point x by A(x). 
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Continuous Sequence: Let S be a finite set. Any A = ((si, At\), ■ ■ ■ , (si~, At)-)) where k G 
MU {0} and a 8 G 5 and Atj G R + , is called a finite continuous sequence of elements of S. We 
also denote the set of all finite continuous sequences of S by M C (S) which is formally defined 
as: 

M c (S) = {A={(s 1 ,At 1 ),--- ,(s k ,At k ))\ 
fcelu {0}, Oi G 5, Au G E+} 

Notice that a continuous sequence actually defines a function from [0, Yli=i to S in which 
any x G Ej=i S}=i ^ s mapped to Sj. Also notice that any function from [0, T) to 
S in which the output changes a finite number of times when the input changes continuously 
from to T can also be represented using a finite continuous sequence. We denote the value 
of the function defined by continuous sequence A at point x by A{x). 

Sequence Function: Let S be a finite set. Any function u : M D (S) — > R is called a sequence 
function (discrete). Also, any function u : M C '(S) — > K, is called a sequence function (contin- 
uous). 

Length of a Sequence: We denote the length of a sequence A by \A\ which we define next. For 
any discrete sequence A = (si,--- ,Sfc) we define \A\ = k. For any continuous sequence 
A = ((si, Aii), • • • , (sfe, Ai fe )) we define \A\ = Ya=i ^i- 

Equivalence of Sequences: We say two sequences A and B are equivalent and denote that by 
A = B if they represent the same sequence that is if and only if they have the same length 
and their corresponding functions have the same value at every point in their domain. The 
formal definition is given next. 

If A and B are two discrete sequences, then A = B if and only if \A\ = \B\ and for Vi G 
{l,--- ,\A\}:A(i) = B(i). 

If A and B are two continuous sequences, then A = B if and only if \A\ = \B\ and Vx G 
[0,\A\):A(x) = B(x). 

Concatenation of Sequences: We denote the concatenation of two sequences A and B by A.LB. 

Refinement of a Sequence: We denote the portion of a discrete sequence A in [x, y] by Ai x ^ 
and also the portion of a continuous sequence A in [x, y) by A^ x ^ which we formally define 
as the following. 

For a discrete sequence A = (si, • • • ,s k ), if the intersection of [1, fc] and [x,y] is empty we 
define A\ x y -\ to be the empty sequence. Otherwise suppose [/, I] is the intersection of the two, 
then we define A\ x ^\ = (sf, ■ ■ ■ , si). 

For a continuous sequence A = ((si, Aii), • • • , (sf~, At^)), if the intersection of [0, \A\) and 
[x, y) is empty we define A\ x ^ to be the empty sequence. Otherwise suppose [/, I) is their 
intersection then we define: 

A[ x ,y) = ((s P ,Atp - 5), (s p+ i, Atp+i), ■ ■ ■ 

■■■ ,(a q - 1 ,At q - 1 ),(8 q ,At q -S')) (4.2) 
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where g,l GK and 5, 5' € H + U {0} are chosen such that: 



p-l p 

Y,&U<f <Y, At < (4.3) 

i=l i=l 
q-1 q 

J2AU<l<J2 A ti ( 4 - 4 ) 

i=l i=l 
p-l 

8 = f-J2 At i ( 4 - 5 ) 

i=l 

<5' = Ai 4 - I (4.6) 

i=i 

Domination of Sequences: We say sequence A is dominated by sequence B and we show that 
by A -< B if we can cut out parts of B to get A. Next we give a formal definition. 

If A and i? are discrete sequences then A -< if and only if A is a subsequence of 5. 

If ^4 and -B are continuous sequences then A -< B if and only if there exist m € M, < x\ < 
X2 < ■ ■ ■ < X2m < such that: 

^ = B [xuX2) ± ■ ■ ■ ±B [x2m _ 1>X2m) (4.7) 

Marginal Value of a Sequence Function: For a sequence function u : H(5) -> I we define 
u{B\A) = u(A±B) - u{A) where A,B e H(5). 

Throughout this paper we will use the to denote the empty sequence. We will also use H(5) 
instead of M C (S) and M D (S) when a proposition applies to both discrete sequences as well as 
continuous sequences. 

5 Submodular Non-decreasing Sequence Functions 

In this section we define the class of submodular non-decreasing sequence functions. In the next 
sections we provide a greedy heuristic for maximizing such functions subject to a given maximum 
length for the solution sequence. 

Let 5 be a finite set and u : H(5) — > R be a sequence function. We define the following 
conditions: 

Condition 5.1 (Non-Decreasing). A sequence function u is non-decreasing if: 



VA, B £ H(5) : A -< B => u(A) < u{B) (5.1) 
«(0) = (5.2) 
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Condition 5.2 (Sequence-Submodularity). A sequence function u is sequence-submodular if: 



VA, B, C G H(5) : j4 -< B u(C| A) > u{C\B) (5.3) 

Condition 5.3 (Differentiability). This condition only applies to continuous sequence functions. 
Note that we use the term "continuous sequence function" to signify that the argument to the func- 
tion is a continuous sequence and not the function itself, however the differentiability condition 
that we define next is a property of the function. A continuous sequence function u : M C '(S) —> II 
satisfies the differentiability condition if for any A 6 M C (S), u(A\o f\) is continuous and differen- 
tiable with a continuous derivative with respect to t for t G [0, oo) except that at a finite number of 
points it may have different left and right derivatives and thus a non- continuous derivative. 



6 Greedy Heuristic (Discrete) 

Here we provide a greedy heuristic for maximizing non-decreasing submodular sequence functions 
(discrete). Let S be a finite set and u : M D (S) — > R be non-decreasing submodular sequence 
function. Consider the problem of finding a sequence H G H^S) that maximizes u subject to 
\H\ < T for a given T G IN. Also suppose that O G M D (S) where O = (ri, • • • , tt) is the optimal 
solution to this problem. 

Lemma 6.1. For any A,B£ M D (S) there exist s G S such that u(s\A) > j^u(B\A) 

All of the proofs are in the appendix when omitted. 
We use the Lemma 6.1 to prove the following theorem: 

Theorem 6.2. For sequence H G M D (S) where H = • • ■ ,s T ) and a G [0, 1] if: 

Vi G {1,- ■• ,T},Vs G S : ti(sj|-ff[i,i_i]) > a u(s\H[ lti _i]) (6.1) 

then: 

u(U) e a 

The condition of Theorem 6.2 is simply saying that H = (si,-- - ,st) should be chosen by 
choosing each Sj locally such that p(si\H[i^_i}) is at least a times its optimal local maximum. 
Setting a = 1 means we can compute the locally optimal Sj conditioned on s\, ■ ■ ■ , Sj_i. Based on 
the previous intuition we present the greedy algorithm 1 to find H . 

for i = 1 to T do 

find Si that maximizes n(sj|B i ~ 1 ) ; 
H i <_ H i-lj_ s . ■ 

end 

H <— H T ; 

Algorithm 1: Greedy algorithm for the discrete case 



The greedy algorithm algorithm 1 starts with an empty sequence H and then builds the com- 
plete sequence by finding at iteration i the Sj that gives the highest increase in the value of u when 
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appended to the end of the current sequence or more formally the Sj that maximizes u(sj|iP ) 
(or equivalently maximizes J_Sj)). Also note that in algorithm 1, at the step where we find 

Si that maximizes u(si\H l ~ l ) . We may not be able to find the locally optimal Sj and instead we 
may only be able to find for which u(si\H l ~ l ) is at least a times its locally optimal maximum. 

Theorem 6.3. For any non- decreasing submodular function u and any given T £ IN, the greedy 
algorithm 1 can be used to find a sequence that produces a value of u which is at least 1 — — times of 
the optimal. In particular if we can locally find the optimal at each iteration the resulting sequence 
gives a value of u which is at least 1 — - of the global optimal. 

Proof. The proof of Theorem 6.3 trivially follows from Theorem 6.2 and algorithm 1. □ 



7 Greedy Heuristic (Continuous) 

In this section we provide an equivalent of the greedy heuristic of section 6 for the continuous 
version. Let S be a finite set and u : H c '(5) — > M, be differentiable non-decreasing submodu- 
lar sequence function. Consider the problem of finding a continuous sequence H € M C (S) that 
maximizes u subject to \H\ < T for a given T E IR, + . Also suppose that O € M C (S) where 
O = ((ri, Awi), • • • , (rfe, Aw' k )) is the optimal solution. 

We define u s (5\A) where s E S, 5 £ H + and A G H as the following: 

Us (5\A) = ^u((s,5)\A) (7.1) 

= ±(u(AMs,6))-u(A)) (7.2) 

= ±u(A±(s,5)) (7.3) 
We also define u s (5\A) at 6 = as the following: 



u s (0\A) = lim u s (6\A) (7.4) 
5^0+ 

Note that (7.1) is always defined because we are assuming that u satisfies the Condition 5.3 
and (7.3) can be written as -^u((A±(s, c»))[o iai+s))- Also note that according to Condition 5.3 u s 
is a continuous function over R + except at a finite number of points. 

Corollary 7.1. For any A G M c like A = ((si, At].), • ■ • , (s fc , At fc )) let A 1 = ((si, At]), • • • , (s i; At»)) 
t/ien o/ i/ie following hold: 

u((s,5)\A)= I u s (x\A)dx (7.5) 

JO 

u((s,<5 2 )L4!_(s,<5i)) = [ 2 u s (x\A)dx (7.6) 



u(A)=Y, iisM Ai ~ l )dx (7.7) 
i=i ^ 



6 



Proof. (7.5) and (7.6) trivially follow from (7.1) and (7.7) follows from the definition of marginal 
values. □ 



Lemma 7.2. For any A,B<=L M c such that A -< B and any s G S, we have u s (5\A) > ii s (5\B) for 
any 5 G R + U {0} except at a finite number of points. 

Proof. The proof is by contradiction. Suppose there are A, B G M c such that A ~< B and s G S 
and 5 G R + for which u s (5\A) < u s (5\B). If either u s (5\A) or u s (<5|-B) is non-continuous at 5 then 
this is one of the finite number of points that are exceptions in Lemma 7.2. Otherwise since they 
are both continuous at 5 there should be a small neighborhood around 5 in which u s (S\B) is greater 
than u s (5\A). More formally: 

3e G E + , Vx € [5 - e, «J + e] : < (7.8) 

Now we show that (7.8) can never happen: 

u((s,e)\A±(s,5-e))= t u s {x\A) (7.9) 

JS~e 

u((s,e)\A±(s,5 - e)) < I ii s {x\B) (7.10) 

JS-e 

u((s,e)\A±(s,S-e))<u((s,e)\B±(s,5-e)) (7.11) 

Notice that AJ-(s,5 — e) -< -B_L(s,5 — e) and therefore (7.11) contradicts Condition 5.2 which 
says it is a submodular sequence function. It shows that our assumption of u s (5\A) < u s (5\B) leads 
to contradiction which completes the proof. □ 

Corollary 7.3. For any A € M C (S), and any 5 G [0, oo), u s (5\A) is a monotonically non-increasing 
function in 5. That is 5± < 62 => u s (6i\A) > ^3(521^4)- 

Proof. The proof is similar to the proof of Lemma 7.2. □ 
The following lemma in the equivalent of Lemma 6.1 for the continuous case. 

Lemma 7.4. For any A,B G M C '(S) there exist s G S such that u s {0\A) > ^u(B\A) 

Next we present our main result for this section. 

Theorem 7.5. For any sequence H G M C (S) where H = ((s\, At±), ■ ■ ■ , (sj^Affc)) and \H\ = T 
and a G [0, 1], if: 

Vt G [0,T),Vs G S : ju{H m ) > a u a (0|% t) ) (7.12) 

4t^t > 1 (7.13) 

u{0) e a y ' 
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t <- ; 
i^l ■ 

while t < T do 

find (si,Ati) such that Vs € 5 € [0, Atj) : u^OliT^T^, <5)) > a u s (0\H i ~ 1 ±{s i , 6)) ; 

H i i-H i - 1 ±(8i,Au) ; 
t <- t + Atj ; 

i -<— i + 1 ; 
end 

if <- ; 

Algorithm 2: Greedy algorithm for the continuous case 

The condition of Theorem 7.5 is simply saying that H should be chosen such that at each point 
t G [0,T), the derivative of it is at least a times its optimal local maximum. Setting a = 1 means 
at each t G [0, T) we can find the best s G 5 conditioned on -ff"[o,t)- Based on the previous intuition 
we present a generic greedy algorithm 2 to find H. This algorithm in general may not terminate, 
however if it terminates, for the resulting H, u(H) will be at least (1 — 4?) times the optimal. In 
general there can be other ways for finding such an H for each specific problem as we will show 
one such example later in this paper. 

In algorithm algorithm 2 in the main loop we need an Incremental Oracle that is specific to 
each problem. As we mentioned before, there might be other ways for finding a sequence H that 
satisfies the condition of Theorem 7.5 and as long as it satisfies that condition we have the 1 — ^ 
guarantee. 

8 Online ad allocation problem 

The motivation for online ad allocation problem is the keyword based ad auctions. In these auctions, 
advertiser submits to the search engine his bid for each keyword plus his total budget. Based on 
these information, search engine should decide which ads to show for each keyword. The objective 
of online ad allocation is to find a way to perform this allocation with maximum revenue for the 
search engine. Assuming that for each query, the search engine can show d ads simultaneously, the 
online ad allocation problem can be defined as follows: We have m ads and n distinct keywords 
(query types). Let M be the set of ads and N the set of query types. Let pij be the expected 
payment of the advertiser to the search engine for showing ad i for a query of type j. The expected 
payment could be computed based on the click-through rate of the ad, the relevance of the ad to 
the keyword, the bid of the advertiser for that keyword and possibly other parameters. Also each 
ad i has a budget Sj. The goal is to assign incoming queries to ads as they arrive in such a way that 
maximizes the profit of the search engine in a given time period. Here we make the assumption 
that the types of the incoming queries are i.i.d random variables drawn from a fixed but possibly 
unknown distribution qj where qj is the probability of a query being of type j (^ ■ qj = 1). Also 
we assume that the expected payments {pij) are small compared to budgets (-Bj). Note that in a 
sequence of r queries the expected number of queries of type j is rqj. We would like to express 
this as a function of time so we define a virtual time based on the number of queries that have 
arrived so far. In terms of our virtual time the expected number of queries arriving in a period of 
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length At of type j is Atqj. Throughout the rest of this section we will omit the word "virtual" 
and always use "time" to refer to virtual time unless explicitly stated otherwise. Also let T be the 
end of the time period in terms of the virtual time. So the problem is ti find an allocation that 
maximizes the revenue of the search engine in time [0,T). 

Consider the offline version of the problem in which we know the queries in advance. We could 
solve the problem using LP rounding and get a solution close to optimal (with the approximation 
ratio very close to 1 assuming that pij <C Bi). Now consider the online version of the problem in 
which we knew the distribution qj. Again we could use LP rounding to get a solution with the 
expected value very close to optimal expected value. There are two problems however with the 
online version. The first one is that we cannot use LP if we do not know the distributions and the 
second one is that due to the huge size of the input it is not possible to use LP rounding in practice 
for this problem. 

Next we consider the greedy algorithm and we show that its expected performance is at least 
1 — of the optimal. The important advantages of the greedy algorithm are that it does not depend 
on the distribution of the queries and it is easy and fast to compute in real time even with huge 
input data. As such it is being used in practice [4]. 

We define a "configuration" as a mapping of query types to ads such that each query type is 
mapped to at most d ads. Let S be the set of all possible configurations. We can now represent any 
allocation of ads to queries over time [0, T) by a continuous sequence H = ((si, Aii), • • • , (sk, At^)) 
where Si € S, Aij G R + , A; € IN and \H\ = T which means "Use each configuration s p (in order) 
for a duration of At p for p G {1, • ■ ■ , k}". We call H an "Allocation Strategy". 

Let u{H) be the expected utility of the search engine for using an allocation strategy H. Note 
that for any given sequence of queries we can say based on H exactly which ads are displayed for 
each incoming query and so we can directly compute the utility of the search engine. Next we show 
that u is a Submodular N on- decreasing Sequence Function and so using a greedy algorithm yields 
an allocation that is at least 1 — - of the optimal. First we explain how the greedy algorithm works. 

At any point in time, the greedy method chooses the best configuration as follows: For each 
query type j map it to the d ads with highest among those that have not exhausted their budgets 
yet and denote them by Qj{s). Let r(s) be the expected revenue rate of such a configuration s. 
We can write r(s) as follows: 



Note that the revenue of the search engine for using configuration s for a short period of length 
At assuming that none of the ads exhaust their budget during that time is given by r(s)At. 

The greedy algorithm works as follows: Choose the best configuration (the one with maximum 
r(s)) as explained above by assigning query type j to the d ads with highest p^ among those that 
have not exhausted their budgets yet. Keep that configuration until at least one of the ads runs 
out of budget. Then recompute the best configuration and switch to it. It is easy to see that the 
derivative of u(H[ ^) with respect to t is r(s) where s = H(t) is the configuration that is active 
at time t in H. That is because ^u{H\^ t ^) = u s (H^ ^) is exactly the rate at which the search 
engine is accumulating profit at time t which is r(s) and for all other s' G S we have r(s') < r(s) 
at time t. That also means that our greedy algorithm satisfies the requirement of the incremental 
oracle in algorithm 2 as the current configuration always has a higher revenue rate than all the 





j£N ieQj(s) 
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other configurations. Also note that we may need to change the configuration only when an ad 
runs out of budget which means the total number of configuration changes is no more than m. The 
only thing that remains to be shown is that the utility function u is a Submodular N on- decreasing 
Sequence Functions which we prove next. 

Lemma 8.1. The utility function of online ad allocation problem satisfies Condition 5.1. In par- 
ticular, consider the allocation strategies A,B6l and assume that A -< B. The remaining budget 
of each ad at the end of using B is less than or equal to its remaining budget in A. 

Proof. Consider the allocation strategies A, B £ H and assume that A -< B. We argue that the 
profit extracted from each ad in B is at least as much as the profit extracted from each ad in 
sequence A. 

We partition the ads into two categories: 

• Ads that have no budget left after running sequence B. 

• Ads that still have budget after running sequence B. 

In the former case, sequence B extracted the maximum possible budget from the ad. So for this 
set of ads, our claim holds. 

For the ads that belong to the second category, we know that they still have budget available. 
Consider an ad i that belongs to this category. We will show that the profit extracted by B from 
this ad is at least as much as the profit extracted by A. 

Consider the configuration s £ S that is active in B for a total time of At. For all queries of 
type j that arrive during that time and any ad i that is allocated to them by configuration s, we 
know that the profit extracted from budget of ad i by those queries is Atqj because ad i never 
ran out of budget. Since A -< B, configuration s is either not present in A or was used in A for 
less total time than B and so the total profit extracted from ad % in A is no more than the profit 
extracted from ad i in B. 

Since for both categories the expected profit extracted by B from each ad is higher than or 
equal to the profit extracted by A from that ad, we can conclude that the non-decreasing property 
holds. □ 

Next, we show that Condition 5.2 holds as well. 

Lemma 8.2. Online ad allocation problem satisfies Condition 5.2. 

Proof. Consider the allocation strategies A,B,C EH and assume that A -< B. First of all, based 
on Lemma 8.1, we know that the remaining budget of each ad after A is less than or equal to its 
remaining budget after B. It is also easy to see that the contribution of each ad to u(C\B) or 
u(C\A) is equal to the difference in its budget before and after using the C. Now, consider using 
the allocation strategy A first followed by C. Again we partition the ads into two categories: 

• Ads that have exhausted all of their budget after running A1.C. 

• Ads that still have budget after running A1.C . 

The contribution of the ads in the first category to u(C\B) is no more than their contribution 
to u(C\A) because they had equal or more remaining budget after using A than after using B and 
they have contributed all of their remaining budget to u{C\A). 
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Now consider the ads that belong to the second category. By the same reasoning as we did for 
the proof of Lemma 8.1 we conclude that C has has extracted profit from those ads at full rate 
since they did not run out of budget. So their contribution to u{C\A) and u{C\B) is equal. □ 

Finally, we can show that Condition 5.3 is also met. Notice that the derivative of the utility 
function is a step function that changes its value only when either in the sequence there is change 
of configuration or when some ad runs out of budget. The utility function is therefore differentiable 
and its derivative is continuous except on the endpoints of each piece. The total number of pieces 
is bounded by the number of ads which is finite. Therefore we conclude that the utility function is 
differentiable and its derivative is continuous except at a finite number of points. 

Using the above properties, we conclude that the approximation ratio of greedy algorithm that 
will select the best configuration at each point of time is (1 — -). 

9 Query Rewriting 

Query rewriting is a common mechanism used in information retrieval to improve the relevance of 
the returned result. This method also has been used in the search advertising. When dealing with 
large data sets, handling queries in real time might not be possible if we need to access a large 
portion of the data set so some filtering might be required. Query rewriting tries to do the filtering 
while preserving the quality as much as possible. At the high level, query rewriting outputs a list 
of rewrites relevant to the original query. In this section we focus on the specific query rewriting 
problem related to search advertising which is defined in [10] and is explained next. 

We are given a set M of ads and a set N of query types (keywords). Also for each ad i and 
query j define pij, Bi and qj as in section 8. We are also given a set R of rewrites. Each rewrite 
r G R is associated with a small subset of the ads which we denote by W r and is also given to us. 
The goal is to associate each query type with at most k rewrites so that later the ad-allocator only 
considers the ads that are associated with the rewrites of each query type in order to find the best 
ads to show for incoming queries of that type. Suppose that Yj is the set of rewrites associated with 
query j. Formally, the ad-allocator will then only consider the ads like i such that i £ IJrgy ^ *° 
find the best d ads to show for queries of type j. The problem is how to find the sets Yj so as to 
maximize the maximum profit that can be extracted by the ad allocator. 

Next we give a greedy algorithm that gives al-^ 1 e « 0.47 approximation which improves 
the previous 0.25 approximation given in [10]. 

We define a "partial allocation" as a tuple of the form (j, Yj,B J ) where B 3 = (B\j, • • • , B m j) is 
a vector of budgets in which Bij is the maximum budget that we allow the ad allocator to extract 
from ad i for displaying the ad i for query type j. Yj is the set of rewrites for query type j. Note 
that any solution of the the query rewriting problem and the corresponding allocation problem can 
be written as a sequence of the following form: 

H = (h , Yj 1 , ) , • • • , (j n , Y jn , ) (9.1) 
We now define a utility function u(H) on the sequences of the above form as follows: 

Definition 9.1 (u(H)). Initialize u to and set each of the Bi to the total budget of ad i. For each 
of the partial allocation tuples in order do the following. Suppose the current tuple is (j, Yj,B 3 ). Set 
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the current budget limit of each ad i to the minimum of Bi and Bij . Also suppose that pij = for all 
ads that are not associated with any of rewrites in the Yj. Also ignore all the queries of type other 
than j. Use the greedy algorithm of section 8 to solve the assignment problem for only the queries 
of type j considering the current budget limits. Notice that the greedy algorithm is optimal when we 
have only one query type. After computing the allocation, update each Bi to reflect how much of 
its budget has been used by j and then proceed to the next tuple in the sequence. Let B(H) denote 
the vector of remaining budgets at the end of this process (we will use this notation later). Suppose 
OPT = (1, Yi, B 1 ), ■ ■ ■ , (n,Y n , B n ) is the sequence in which Yj's are the optimal rewrites and B^ 
is the vector of budgets used by ad j in the corresponding optimal allocation. Clearly u{OPT) is 
equal to the total utility of the search engine for the optimal solution. 

Lemma 9.2. u(H) as defined in Definition 9.1 is a non- decreasing submodular sequence function. 

Proof Sketch: We only give the sketch of the proof as it is very similar to the proof of Lemma 8.1 
and Lemma 8.2. Again we separate the ads to two groups. Those who have exhausted their budget 
at the end of computing u using the Definition 9.1 and those who still have budget left. We can 
then verify that for any two sequence of partial allocations A and B such that A -< B, all of the ads 
that are in the first group after computing the u(B) using Definition 9.1 have made their maximum 
contribution to u(B) and so cannot contribute more to u(A). For all the other ads we can show 
that their contributions to u(A) and u(B) are equal. The submodularity property follow in the 
same was as for Lemma 8.2. □ 

In order to be able to use the greedy algorithm of algorithm 1 to approximate the OPT we 
need an oracle that can find the best partial allocation (j,Yj,B J ) to be appended to the current 
sequence. The marginal utility of adding a partial allocation (j,Yj,B J ) is a non-decreasing sub- 
modular function in terms of Yj with the constrain that \Yj\ = k. Therefor for each j we can get a 
1 — - approximation by using a greedy algorithm start from an empty Yj and add the rewrite the 
increases the marginal utility the most until k rewrites have been added. We then select among 
all possible query types j the one for which (j, Yj,B J ) has the highest marginal utility and append 
that to the current sequence of partial allocation. Since we are approximating the best (j,Yj,B J ) 
within a factor of 1 — K based on Theorem 6.3 the approximation ratio of the overall algorithm is 
1 ~ 0.47. The complete algorithm is described in algorithm 3. 

10 Conclusion 

In this paper, we defined the notion of submodularity for functions over sequences and then showed 
that if a sequence function is submodular and non-decreasing, the approximation ratio of greedy 
algorithm for maximizing such a function subject to a maximum length constraint on the solution 
sequence is (1 — -). As an example, we modeled the online ad allocation problem in this framework 
implying that a greedy approach achieves a 1 — ^ approximation assuming that the distribution of 
queries over keywords do not change over time (i.e., queries are i.i.d random variables). 
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A Proofs 

Proof of Lemma 6.1. Suppose B = (si, 



, Sfc), using the definition of u we have: 



u{B\A) =Y,<^\ALB {1 ^ X] ) (A.l) 

i=i 

The sum on the right hand side of (A.l) consist of k terms, so there should be at least one 

term which is above or equal to the average of the terms. That means there should be an index 
l<j'<k such that (A.2) holds. 

u(s j /|ALB[ 1 y_ 1] ) > \u{B\A) (A.2) 

u(sf\A) > -—u(B\A) (A.3) 

Combining (A.2) with Condition 5.2 because A -< A-LBm 3 v_]i we get (A.3) which completes 
the proof. 

□ 

Proof of Theorem 6.2. According to Lemma 6.1 we argue that for any H = (si, • ■ ■ , st) and a for 
which (6.1) holds, (A. 4) must also hold. 

> (A.4) 
«(*<|%i_i]) > -^(OT^x,^!]) - u^Li-j])) (A.5) 

> |(«(0) - u{H [x ,i_ x] )) (A.6) 

u(ff M ) - uiH^) > -(u(0) - uiH^i-n)) (A.7) 

a ot 
u(H [lA ) > -u{0) + (1 - -)n(F [lii _ 1] ) (A.8) 

In order to derive (A.6) from (A.5) we have used Condition 5.1 to infer that it(0_LiT[i j_i]) > 
u(O). 



a 



n(F [liT] )> (^l-(l--)^n(O) (A.9) 
u(iT)> (l-((l-|)l) a )n(0) (A.10) 

u(H) > (l - ±\ u(0) (A.ll) 

Notice that (A.8) defines a recurrence relation which can be solved to get (A.ll) which completes 
the proof. □ 
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Proof of Lemma 7.4- Suppose B = ((si, At±), ■ ■ ■ , (sfc, At^)) and let B % = ((si, Aii), • • • , (sj, Ai*)). 
Using the definition of u and (7.1) we have: 



u(B\A) = ^T UsixlA-LB'-^dx (A.12) 



We argue that there should be some 1 < i < k for which there exist some 5 € [0, Aij) such that 
u Si {5\A^B 1 ^ 1 ) > jj^u(B\A) otherwise that means the term inside the integral on the right hand 

side of (A.12) is always less than r^u(B\A) which means the sum of the integrals would be less 
that u(B\A) which contradicts the (A.12). Suppose for i' and 5' (A. 13) holds. 

u s ^'\ALB^ l )>^u(B\A) (A.13) 

u Sif (6'\A)>-Lu( B \A) (A.14) 

u Si ,(0\A)>jL u ( B \A) (A.15) 

We can infer (A.14) from (A.13) by using Lemma 7.2. Applying Corollary 7.3 to that we get 
(A.15) which completes the proof. □ 

Proof of Theorem 7.5. Using Lemma 7.4 we have (A. 16). Combining that with (7.12) we get 
(A. 17). Using the definition of marginal values and using Condition 5.2 we get (A. 19) which is 
a differential equation. 

Vt G [0,T) 3s G S : u 8 {0\H m ) > -L u (O\H [0>t) ) (A.16) 
d d 

Vt e [0, T) : ju(H m ) > -u(0\H m ) (A.17) 
d ql 

Vt G [0, T) ■ ju{H m ) > -(u(0±H m ) - u(H m )) (A.18) 
d d 

Vt G [0,T) : ju{H m ) > -(u(0) - u(H m )) (A.19) 
We can rephrase the (A.19) as (A. 20) and solve it to get (A. 24). 
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u(H m ) + T-± u (H m ) > u(0) (A.20) 

i * £«*'«C°) (A. 21) 

[ Jt tu{H ^ ) ) dt ~ £ e¥ <°) dt (A.22) 

^ U (H [0>X) ) > ^(e^ - l)u(O) (A.23) 

u(H [0 , x) ) > (1 - 4^)«(0) (A.24) 

> (1 - ^)n(O) (A.25) 

Setting x = T in (A.24) we get (A.25) which completes the proof. □ 
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for i € M do 

Bi <— budget of ad i ; 
end 

B <— (Bi, • • • , B m ) ; 
while N ^ do 
A' <- ; 

// find the best partial allocation to append 
for j G N do 

IS <- ; 

// find the best k rewrites greedily 
for w = 1, • • • , fc do 

5' <- ; 

for r € R\Yj do 

<y<-«((j,^U{r},S)|H) ; 
if 5' < 5 then 
5' <- 5 ; 
r' r ; 
end 
end 

y^ yj U{r'}; 
end 

// compute the marginal utility of adding (j,Yj) 

A'<-«((j,^,£)|io ; 

if A' < A then 

A' <- A ; 

f <- i ; 

end 
end 

Define i?- 7 ' to be exactly equal to how much of the budgets are used by (j, Y~ U {r}, B) 
when appended to H ; 
H ^ H±(f,Y f ,Bi') ; 
TV <- 7V\{j'} ; 
B<-B-Bi' ; 
end 

Algorithm 3: Query Rewriting Algorithm 
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